A Challenge Set for Advancing Language Modeling
نویسندگان
چکیده
In this paper, we describe a new, publicly available corpus intended to stimulate research into language modeling techniques which are sensitive to overall sentence coherence. The task uses the Scholastic Aptitude Test’s sentence completion format. The test set consists of 1040 sentences, each of which is missing a content word. The goal is to select the correct replacement from amongst five alternates. In general, all of the options are syntactically valid, and reasonable with respect to local N-gram statistics. The set was generated by using an N-gram language model to generate a long list of likely words, given the immediate context. These options were then hand-groomed, to identify four decoys which are globally incoherent, yet syntactically correct. To ensure the right to public distribution, all the data is derived from out-of-copyright materials from Project Gutenberg. The test sentences were derived from five of Conan Doyle’s Sherlock Holmes novels, and we provide a large set of Nineteenth and early Twentieth Century texts as training material.
منابع مشابه
Advancing Global Health – The Need for (Better) Social Science; Comment on “Navigating Between Stealth Advocacy and Unconscious Dogmatism: The Challenge of Researching the Norms, Politics and Power of Global Health”
In his perspective “Navigating between stealth advocacy and unconscious dogmatism: the challenge of researching the norms, politics and power of global health,” Ooms argues that actions taken in the field of global health are dependent not only on available resources, but on the normative premise that guides how these resources are spent. This comment sets out how the application of a predomina...
متن کاملAiming Higher: Advancing Public Social Insurance for Long-term Care to Meet the Global Aging Challenge; Comment on “Financing Long-term Care: Lessons From Japan”
Globally, aging populations are driving the demand for long-term care (LTC) services for a growing number of older people with disabilities or chronic illnesses. A key challenge for policy-makers in all countries is to find a comprehensive solution to financing LTC services to make them widely accessible, affordable, and equitable for all in need. In this commentary, we...
متن کاملAutomatic Generation of a Multi Agent System for Crisis Management by a Model Driven Approach
Considering the increasing occurrences of unexpected events and the need for pre-crisis planning in order to reduce risks and losses, modeling instant response environments is needed more than ever. Modeling may lead to more careful planning for crisis-response operations, such as team formation, task assignment, and doing the task by teams. A common challenge in this way is that the model shou...
متن کاملAn ontology for component-based models of water resource systems
[1] Component-based modeling is an approach for simulating water resource systems where a model is composed of a set of components, each with a defined modeling objective, interlinked through data exchanges. Component-based modeling frameworks are used within the hydrologic, atmospheric, and earth surface dynamics modeling communities. While these efforts have been advancing, it has become clea...
متن کاملA Validated Framework for Gamified Electronic Teaching of English Language with a Systemic Approach to Instructional Design
One way of making learning English language interesting is through its gamified instruction, especially nowadays that most teachings are done virtually. Doing so, however, requires a valid framework. To construct such a framework a set of 22 peer reviewed papers on language learning and gamified teaching, from among 94 of such papers, was content analyzed and key concepts, and frequently used e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012